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Distributed storage infrastructures require the use of data redundancy to achieve high data reliability. Unfortunately, the use 
of redundancy introduces storage and communication overheads, which can either reduce the overall storage capacity of 
the system or increase its costs. To mitigate the storage and communication overhead, different redundancy schemes have 
been proposed. However, due to the great vaiiety of underlaying storage infrastructures and the different application needs, 
optimizing these redundancy schemes for each storage infrastructure is cumbersome. The lack of rules to determine the 
optimal level of redundancy for each storage configuration leads developers in industry to often choose simpler redundancy 
schemes, which are usually not the optimal ones. In this paper we analyze the cost of different redundancy schemes and 
derive a set of rules to determine which redundancy scheme minimizes the storage and the communication costs for a given 
system configuration. Additionally, we use simulation to show that theoretically-optimal schemes may not be viable in a 
realistic setting where nodes can go off-line and repairs may be delayed. In these cases, we identify which are the trade-offs 
between the storage and communication overheads of the redundancy scheme and its data reliabihty. 

Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Error Control Codes; E.5 [Files]: 
Backup/Recovery; C.2.4 [Computer-Communication Networiss]: Distributed Systems; H.3.2 [Information Storage and 
Retrieval]: Information Storage 

General Tenns: Reliability, Performance 

Additional Key Words and Phrases: erasure correction codes, data reliability, data redundancy, distributed storage systems, 
redundancy costs, regenerating codes, storage systems design. 

1. INTRODUCTION 

Distributed storage systems are widely used today for reasons of scalability and performance. There 
are distributed file-systems such as Google FS PGhemawat et al. 2003|, HDFS IBorthakur 20071 . 
GPFS ISchmuck and Haskin 2002] or Dynamo [Hastorun et al. 2007J and peer-to-peer (P2P) stor- 
age applications like Wuala iWuaLa 20101 or OceanStore iKubiatowicz et al. 20001 . 

To achieve high reliability in distributed storage systems, a certain level of data redundancy is 
required. Unfortunately, the use of redundancy increases the storage and communication costs of 
the system: (i) the space required to store each file is increased, and (ii) additional communica- 
tion bandwidth is required to repair lost data. It is important to optimize redundancy schemes in 
order to minimize these storage and communication costs. For example, in data centers where 
the energy cost associated with the storage sub-system represents about 40% of the energy con- 
sumption of all the IT components IGuerra et al. 20101 . minimizing storage cost can significantly 
reduce the per-byte cost of the storage system. Whereas in less-reliable infrastructures — i.e. P2P 
systems — where the storage capacity is mainly constrained by the cross-system communications 
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bandwidth [Blake and Rodrigues 2003), minimizing communication costs can increase the overall 



storage capacity of the system. 

Different redundancy schemes have been proposed to minimize the storage and communica- 
tion costs associated with redundancy. Redundancy schemes based on coding techniques such as 
Reed-Solomon codes IIReed and Solomon 19601 or LDPC IPlank and Thomason 20041 allow to 



achieve significant storage savings as compared to simple replication | Rodrigues and Liskov 2005] 
ILin et al. 20041 Weatherspoon and Kubiatowicz 2002 IDimakis et al. 20071 . Moreover, recent ad- 



vances in network coding have lead to the design of Regenerating Codes IDimakis et al. 20071 that 
allow to reduce both, the storage and communication costs, as compared to replication. While cod- 



ing schemes can provide cost efficient redundancy in production environments | Zhang et al. 2010 
IFord et al. 20101 [Fan et al. 20091 . distributed storage designers are still slow in adapting advanced 
coding schemes for their systems. In our opinion, one reason for this reluctance is that coding 
schemes present too many configuration trade-offs that make it difficult to determine the optimal 
configuration for a given storage infrastructure. 

Besides coding or replication one can also combine these two techniques into a hybrid redun- 
dancy scheme. In some circumstances these hybrid redundancy schemes can reduce the costs of 
coding schemes IWu et al. 2005llHaeberlen et al. 20051 . Besides reducing costs, there are other rea- 
sons why maintaining whole file replicas in conjunction with encoded copies is advantageous: (i) 
production systems using replication that want to reduce their costs without migrating their whole 
infrastructure, (ii) peer-assisted cloud storage systems OToka et al. 20101 . like Wuala IWuaLa 20101 
that aim to reduce the outgoing cloud bandwidth by combining cloud-storage with P2P storage, 
and (iii) storage systems needing efficient file retrievals that cannot afford the computational costs 
inherent in coding schemes. Unfortunately, there are no studies analyzing under which conditions 
— i.e. node dynamics and network parameters — hybrid schemes can reduce the storage and com- 
munication costs as compared to simple replication. 

Due to the great variety of redundancy schemes, it is complex to determine which redundancy 
scheme is the best for a given infrastructure that is defined by properties like size (number of storage 
nodes), amount of stored data, node dynamics, and cross-system bandwidth. The aim of this paper 
is to analyze the impact of different properties on the storage and communication costs of the re- 
dundancy scheme. We focus our analysis on Regenerating Codes IDimakis et al. 20071 . As we will 
see in Section ID Regenerating Codes provide a generic framework that also allows us to analyze 
replication schemes and maximum-distance separable (MDS) codes such as Reed-Solomon codes 
as specific instances of Regenerating Codes. 

The main contributions of our paper are as follows: 



— This paper is the first to completely evaluate the storage and communication costs of Regenerating 
Codes under different system conditions. 

— For storage systems that need to maintain whole replicas of the stored files, we identify the con- 
ditions where a hybrid scheme (replication-ncoding) can reduce the storage and communication 
costs of a simple replication scheme. 

— Finally, we evaluate through simulation the effects that different redundancy scheme configu- 
rations have on the scalability of the storage system. We show that some theoretically-optimal 
schemes cannot guarantee data reliability in realistic storage environments. 



The rest of the paper is organized as follows. In Section |2] we present the related work. In sec- 
tions [3] and |4] we describe our storage model and Regenerating Codes. In Section|5] we analytically 
evaluate the storage and communication costs of Regenerating Codes. In Section |6] we analyze a 
hybrid redundancy scheme that combines Regenerating Codes and replication. Finally, in Section|2] 
we validate and extend our analytical results using simulations, and in Section[8j we state our con- 
clusions. 



Cost Analysis of Redundancy Schemes for Distributed Storage Systems 



A:3 



2. RELATED WORK 

Tolerating node failures is a key requirement to achieve data reliability in distributed storage sys- 
tems. Existing distributed storage systems use different strategies to cope with these node failures 
depending on whether these failures are transient — nodes reconnect without losing any data — or 
permanent — nodes disconnect and lose their data. In this section we present the existing techniques 
used to alleviate the costs caused by these transient and permanent node failures. 

Transient node failures cause temporal data unavailabilities that may prevent users from retrieving 
their stored files. To tolerate transient node failures and guarantee high data availability, storage sys- 
tems need to introduce data redundancy. Redundancy ensures (with high probability) that files can 
be retrieved even when some storage nodes are temporally off-line. The simplest way to introduce 
redundancy is by replicating each stored file. However, redundancy schemes based on coding tech- 
niques can significantly reduce the amount of redundancy (and storage space) required while achiev- 



ing the same data reliability | Weatherspoon and Kubiatowicz 2002 Bhagwan et al. 2002 1. Lin et 
al. IILin et al. 20041 showed that such a reduction in redundancy is only possible when node on-line 
availabilities are high. For example, nodes must be more than 50% of the time on-line when files are 
stored occupying twice their original size, or more than 33% of the time on-line when files occupy 
three times their original size. 

To cope with permanent node failures, storage systems need to repair the lost redundancy. Un- 
fortunately, repairing such lost redundancy introduces communication overheads since it requires 
to move large amounts of data between nodes. Blake and Rodrigues [Blake and Rodrigues 2003) 
demonstrated that the communication bandwidth used by these repairs can limit the scalability of 
the system in three main situations: (i) when the node failure rate is high, (ii) when the cross- 
system bandwidth is low, (iii) or when the system stores too much data. Additionally, Rodrigues and 



Liskov I Rodrigues and Liskov 20051 compared replication and erasure codes in terms of communi- 



cations overheads and concluded that when on-line node availabilities are high, replication requires 
less communication than erasure codes. These results pose a dilemma for storage designers: when 
node on-line availabilities are high, erasure codes minimize storage overheads hLin et al. 2004]! 
and replication minimize communication overheads (^Rodrigues and Liskov 2QQ5|/. 

In order to reduce communication overheads for erasure codes, Wu et al. IIWu et al. 20051 pro- 
posed the use of a hybrid scheme combining erasure codes and replication. Although this tech- 
nique slightly increases the storage overhead, it can significantly reduce the communication over- 
head of erasure codes when node on-line availabilities are high. Another technique used to min- 
imize the communication overhead consists in lazy redundancy maintenance IIKiran et al. 20041 
IDatta and Aberer 20061 which amortizes the costs of several consecutive repairs. However, defer- 
ring repairs can reduce the amount of available redundancy, requiring extra redundancy to guar- 
antee the same data reliability. Furthermore, lazy repairs lead to spikes in the network resource 
usage IDuminuco et al. 20071 [Sit et al. 20061 . 

New coding schemes such as Hierarchical Codes or tree structured data regeneration have 
also been proposed to reduce the communication overhead as compared to classical erasure 
codes ILi et al. 2 010: Du minuco and Biersac k 20081 . These solutions propose storage optimizations 
that exploit heterogeneities in node bandwidth and node availabilities. Finally, Dimakis et al. pre- 
sented Regenerating Codes ilDimakis et al. 20071 as a flexible redundancy scheme for distributed 
storage systems. Regenerating Codes use ideas from network coding to define a new family of era- 
sure codes that can achieve different trade-offs in the optimization of storage and communication 
costs. This flexibility allows to adjust the code to the underlaying storage infrastructure. However, 
there are no studies on how Regenerating Codes should be adapted to these infrastructures, or how 
Regenerating Codes should be configured when combined with file replication in hybrid schemes. 
In this paper we will use Regenerating Codes Il Dimakis et al. 20071 IDimakis et al. 20101 as the base 
of our analysis on how to adapt and optimize redundancy schemes for different underlying storage 
infrastructures and different application needs. 
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3. MODELING A GENERIC DISTRIBUTED STORAGE SYSTEM 

We consider a storage system where nodes dynamically join and leave the sys- 
tem IIDuminuco et al. 2007t lUtard and Vernois 20041 . We assume that node lifetimes are random 
and follow some specific distribution L. Because of these dynamics, the number of on-line nodes 
at time t, Nt, is a random process that fluctuates over time. Once stationarity is reached, we 
can replace Nf by its limiting version N ~ \imt_j^ao Nt- Assuming that node arrivals follow 
a Poisson process with a constant rate A, then the average number of nodes in the system is 
N = X ■ K[L] I Pamies-Juarez and Garcfa-Lopez 20T0] . Additionally, it has been observed in real 



traces that during their lifetime in the system, nodes present several off-line periods caused by 
transient failures OSteiner et al. 20071 IGuha et al. 20061 . To model these transient failures, we model 
each node as an alternating process between on-line and off-line states. The sojourn times at these 
states are random and follow two different distributions: Xon and Xoff respectively. Using these 
distributions we can measure the node on-line availability in stationary state as II Yao et al. 20061 : 



All the N nodes in the system are responsible to store a constant amount of data that is uniformly 
distributed among the N nodes. To model different data granularity, we will consider that this total 
amount of stored data corresponds to O different data files of size Ai bytes. However, since each of 
these files is stored with redundancy, the total disk space required to store each file is i? • A^, being 
R the redundancy factor (or stretch factor). The value of R is set to guarantee that files are always 
available with a probability, p, that is very close to one. 

When a node reaches the end of its life, it abandons the system, losing all the data stored on 
it. A repair process is responsible to recreate the lost redundancy and to ensure that the retrieve 
probability, p, is not compromised. There are three main approaches used to recreate redundancy 
when nodes fail: 

(1) Eager repairs: Lost redundancy is repaired on demand immediately after a node failure is 
detected. 

(2) Lazy repairs: The system waits until a certain number of nodes had failed and repairs them all 
at once. 

(3) Proactive repairs: The system schedules the insertion of new redundancy at a constant rate, 
which is set according to the average node failure rate. 

In our storage model we will assume the use of proactive repairs. Compared to eager repairs, 
proactive repairs simplify the analysis of the communication costs. Furthermore, while lazy re- 
pair can reduce the maintenance costs by amortizing the communication costs across several re- 
pairs IDatta and Aberer 20061 . it presents some important drawbacks: (i) delaying repairs leads to 
periods with low-redundancy that makes the system vulnerable; (ii) lazy repairs cause network re- 
source usage to occur in bursts, creating spikes of system utilization IIDuminuco et al. 200711 . By 
adapting a proactive repair strategy, communication overheads are evenly distributed in time and 
we can analyze the storage system in its steady state IIDuminuco et al. 20071 . 



4. REGENERATING CODES 

Regenerating Codes ODimakis et al. 20071 are a family of erasure codes that allow to trade-off com- 
munication cost for storage cost and vice versa. To store a file of size Ai bytes. Regenerating Codes 
generate n blocks each to be stored on a different storage node. Each of these storage blocks has 
a size of a, which makes the file stretch factor i? to be i? = na/Ai. When a storage node leaves 
the system or when a failure occurs, a new node can repair the lost block by downloading a repair 
block of size /3 bytes, /3 < a, from any set of d out of n — 1 alive nodes (k < d < n — 1). We will 
refer to d as the repair degree. The total amount of data received by the repairing node, 7, 7 = d/3, 
is called the repair bandwidth. Finally, a node can reconstruct the file by downloading a bytes 
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lost block 



Fig. 1 . Scheme for the repair and retrieve operations of Regenerating Codes. 



(the entire storage block) from k different nodes. In Figure [T] we depict the basic operations of file 
retrieve and block repair for a Regenerating Code . The labels at the edges indicate to the amount of 
data transmitted between nodes during each operation. 

Dimakis et al. IIDimakis et al. 20071 gave the conditions that the set of parameters (n, k, d,a,"f — 
dj3) must satisfy to construct a valid Regenerating Code. Basically, once the subset of parameters: 
(n, fc, d) is fixed, Dimakis et al. obtained an analytical expression for the relationship between the 
values of a and 7. This a-7 relationship presents a trade-off curve: the larger a, the smaller 7, 
and vice-versa. It means that it is impossible to simultaneously minimize both, communication 
cost and storage cost. Since the maximum storage capacity of the system can be constrained ei- 
ther by bandwidth bottlenecks or disk storage bottlenecks, there are two extreme (a, 7)-points of 
this trade-off curve that are of special interest w.rt. maximizing the storage capacity. The first is 
the point where the storage block size a per node is minimized, which is referred to as Mini- 
mum Storage Regenerating (MSR) code. The second is the point where the repair bandwidth 7 is 
minimized, which is referred to as Minimum Bandwidth Regenerating (MBR) code. According 
to IIDimakis et al. 20101 . the storage block size {a) and the repair bandwidth (7) for MSR and MBR 
codes are: 



, . f M 2d M 2d \ 

(aMBR,7MBR) - (^X2d-fc+l' T2d-k+l) 

There are two particular MSR configurations of special interest: 

— Maximum-distance separable (MDS) codes: In MSR codes, when d = k, we obtain /3msr = 
QfMSR and the Regenerating Code behaves exactly like a traditional MDS codes such as a Reed 
Solomon code IIReed and Solomon 19601 . In this case, the repair bandwidth, 7mds [k — d], is 
identical to the size of the original file, Ai: 

M 

7MDS [k = d]= d /3msr = k ckmsr = k — = M. 

— File replication: In MSR codes, when k = d = 1, the code becomes a simple replication scheme 
where the n storage nodes each store a complete copy of the original file. For k = d = 1, the 
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Table I. Symbols used. 



N 


Average number ot storage nodes. 


\ 


Node arrival/departure rate (nodes/sec ) 


L 


Distribution of the node lifetime (sec). 


a 


Node on-line availability. 


O 


Number of stored files. 


M 


Size of the stored files (bytes). 


U) 


Service bandwidth of each node (KBps). 


P 


Data availability. Probability of detecting k blocks on-line. 


n 


Number of storage blocks. 


k 


Retrieval degree: number of blocks required for retrieval of original data. 


d 


Repair degree: number of blocks required for repair. 


a 


Storage block size. 


/3 


Repair block size. 


7 


Repair bandwidth. 



Storage block size, q;msr [A: = = 1], and the repair bandwidth, 7msr [k ~ d 
the size of the original file, qmsr [k ~ d = 1]^ 7msr [k = d=l\ = Ai. 

In TableUwe summarize the symbols used throughout the paper. 



1], are equal to 



5. COST ANALYSIS 
5.1. Redundancy Cost 

In Section m we defined data redundancy as i? = na/A4. In this section we aim to measure 
the minimum R required to guarantee a desired file retrieve probability p. Since in Regenerating 
Codes the retrieval process needs to download k different blocks out of the total n blocks, the 
retrieve probability p is measured as ULin et al. 20041 . 



P 



(3) 



Given the values of k, a and p, we can use eq. (O to determine the minimum number of redundant 
blocks required to guarantee a certain retrieve probability p using the function 77: 



T][k, a,p\ = min ■ 



i—k 



(4) 



Note that the number of redundant blocks required to achieve p is a function of the repair degree, k, 
the node on-line availability, a, and p. In the rest of this paper we will use the notation ri[k, a,p] to 
refer to the number of storage blocks n required to achieve a retrieve probability p for the specific k 
and a values. 

Since data redundancy is i? = na /Ai,we can obtain the redundancy required by MSR and MBR 
codes, i?MSR and i?MBR respectively, by substituting a with the expressions given for a in eq. dTJ 
and eq. (|2]i: 



-Rmsr — 



BR 



r][k,a,p] ■ ausR _ r][k,a,p] ■ (M/k) 
M ~ M 

r][k,a,p] ■ auBR 



V[k,a,p] 



ri[k,a,p] ■ {2dM/{k{2d -k + 1))) 



2c? • ri[k, a,p\ 
k{2d -k + 1) 



(5) 



(6) 



M M 
Using these expressions we can state the following lemma: 

Lemma 5.1. For n, k and d fixed, the redundancy Rmsr, required by MSR codes is always 
smaller than or equal to the redundancy Rmbr required by MBR codes. 
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(b) Redundancy for MBR codes. 
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(c) Value of r;[fc, a, 0.999999]. 

Fig. 2. Redundancy R required to achieve a retrieve probability p = 0.999999 for MSR and MBR codes as a function of 
the retrieve degree k. Each plot in (a) and (b) depicts the redundancy evaluated using eq. (3) and eq. ^6) for different values 
of d, and different values of the node on-line availability a. In (c) we plot the number of storage blocks n required to achieve 
the retrieve probability p for each case. 

Proof. We can state the lemma as i?MSR < ^mbr- Using equations (|5j and (|6]l we obtain: 

??[fc, a,p] ■ qmsr ^ ri[k,a,p] ■ ombr 
M - M 

ttMSR < ttMBR, 

which is true by the definition of MSR codes and MBR codes IDimakis et al. 20071 . □ 

In Figure |2a] and |2b] we plot the redundancy R required to achieve a retrieve probability p = 
0.999999 for MSR and MBR codes. We plot the values of i? as a function of the retrieve degree, 
k, and for different node availabilities, a. Additionally, for MBR codes we also depict the values of 
^MBR for the two extreme repair degree values; d = k and d = n — 1. We do not evaluate i?MSR for 
different d values because i?MSR is independent of d (see eq. In Figure|2c]we use eq. ^ to plot 
the number of blocks, 7][k, a, p], used in figures |2a] and |2b]for the retrieve probability p = 0.999999. 

In Figure [2] we can see that for MSR and MBR, increasing k reduces R, and therefore, reduces 
storage costs. Additionally, comparing figures |2a] and |2b] we can appreciate the consequences of 
Lemma ISTl for a given node availability, a, and a retrieve degree k, the redundancy required for 
MSR codes is always smaller than the redundancy required for MBR codes. Finally, we can see that 
R first quickly deceases with increasing k before it reaches its asymptotic values. There is no point 
in choosing k very large to minimize the storage costs of MSR and MBR codes, since large k values 
induce a very high computational cost for coding and decoding (Duminuco and Biersack 2009 1. We 
therefore recommend to use values for k where the redundancy R starts approaching the asymptote, 
namely k ~ 5 for a — 0.99, fc = 20 for a = 0.75 and fc = 50 for a = 0.5. In Table HIl we provide 
the redundancy savings achieved by using these k values. 
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Table II. Storage space savings for adopting a Regenerating Code instead of 
replication. We use different k values for each on-line node availability and a 
target retrieve probability of p = 0.999999. 





a = 0.5 
fc = 50 


a = 0.75 
fc = 20 


a = 0.99 

fc = 5 


MSR 


47% 


77% 


84%c 


MBR id = k) 


69% 


55% 


11% 


MBR (d = n- 1) 


81% 


70% 


25% 



5.2. Communication Costs 

When a node fails, the system must repair all the data blocks stored on the failed node. Repairing 
each of these blocks requires to transfer data between nodes, which entails a communication cost. 
In this section we measure the minimum per-node bandwidth required to sustain the overall repair 
traffic of the storage system. We will first compute the total amount of data that is transfered within 
the system during a period of time A: 

data transfered during A = nodes failed during A x blocks stored per node x (7) 
X traffic to repair one block. 

Assuming that there are N nodes with an average Ufetime E[L], the average number of nodes that 
fail during a period A is AiV/E[L] QDuminuco et al. 200711 . Additionally, assuming that data blocks 
are uniformly distributed between all storage nodes, the average number of blocks stored per node 
is n ■ O/N. Finally, since the traffic required to repair one failed block is 7, we can rewrite eq. ([7) 
as: 

data transfered during A = ^A j^^^ ^ ^ ^ 

Then, the minimum per-node bandwidth, W, required to ensure that all stored data can be suc- 
cessfully repaired is the ratio between the amount of data transmitted per unit of time (in seconds), 
and the average number of on-line nodes, aN: 

data transfered during A ^ nO 

W ^ = — . (9) 

A X avg. number on-line nodes a N E[L] 

Assuming that the repair bandwidth, 7, is given in KB, and the node lifetime, L, in seconds, then 
the minimum per-node bandwidth W is expressed in KBps. Assuming that the upload bandwidth 
of each node is always smaller than or equal to the download bandwidth, this minimum per-node 
bandwidth, W, represents the minimum upload bandwidth required by each node. 

If we use the values of the repair bandwidth 7 given in equations ([U and Q, we obtain the 
minimum per-node bandwidth for each Regenerating Code configuration; 



M^MSR = 7msr • 



Wmbr — 7mbr 



ri[k,a,p]0 A4 d r][k,a,p]0 
a N E[L] ~ T(d-fc + l) aNE[L] " 

ri[k,a,p]0 M 2d ri[k,a,p]0 



d-r][k,a,p] OM 
ak{d -k+l) NE[L\ 



aNWyL] k (2d -k + l) a N E[L\ 
Taking these two expressions we can state the following lemma: 



2d-r][k,a,p] OM 
ak{2d -k + l) NE[L] 



(10) 
(11) 



Lemma 5.2. For the same n, k and d parameters, the per-node bandwidth required by MBR 
codes, Wmbr, always smaller than or equal to the per-node bandwidth required by MSR codes, 
Wmsr- 
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Fig. 3. We use eq. (TO) to show the per-node bandwidth required to achieve p = 0.999999 for MSR codes. 

Proof. We can state the lemma as VFmbr < W^ivisr- Using equations ( fTOl i and (fTTT l we obtain: 



7MBR 



il[k,a,p] O 



< 7msr 



ri[k, a,p] O 

a N mu 



aN¥.[L\ 

7mbr < 7msr, 

which is true by the definition of MSR codes and MBR codes from MDimakis et al. 20071 . 



□ 



In the rest of this section we analyze the per-node bandwidth requirements, W, for MSR and 
MBR codes. Since in eq. ( fTOl ) and eq. ( fTTI ) the term does not depend on the Regenerating Code 

parameters, n, fc, d, we will assume that ^^^j = 1. To obtain the minimum per-node bandwidth, 

we simply have to multiply W times 

Communication Cost for MSR Codes. In Figure[3]we use eq. dTol i to analyze the per-node band- 
width requirements of MSR codes when the required retrieve probability is p = 0.999999. We plot 
the results for d~k and d = n — 1 and for three different on-line node availabilities: 

— For d = k we can see in Figure |3a]how the per-node bandwidth of a MDS code such as a Reed- 
Solomon code, is linear in k. In this case, the lowest per-node bandwidth is achieved when fc = 1, 
which corresponds to a simple replication scheme. 

— For d ^ n — 1, however, we can see in Figure |3b] that the per-node bandwidth is asymptotically 
decreasing in k. However, as already said, we recommend to choose fc = 20 when a = 0.75 and 
A; = 50 when a = 0.5. Finally, we can see that for a ~ 0.99, VFmbr is not an asymptotically 
decreasing function: As a tends to one, the number of required blocks, r][k, a,p], tends to k (see 
eq. ^) and the case d = n — 1 is identical to the case d ~ k, which is depicted in sub-figure[3al 

In Figure |3a] we saw that MDS codes (d = k;k > 1) do not reduce the per-node bandwidth 
as compared to replication (d = k = 1) while in Figure |3b] we saw that for d > k, a MSR code 
can reduce the bandwidth as compared to replication except for high node on-line availabilities 
(a = 0.99). We now want to determine the maximum node on-line availability, a, for which a MSR 
code can reduce the per-node bandwidth requirement as compared to replication. Let us denote by 
W^MSR[fc = d = 1] the per-node bandwidth required by replication and VFMSR[fc > ^,d > k] denote 
the per-node bandwidth required by a MSR code. Then, a MSR reduces the bandwidth required by 
replication when the following inequality holds: 



MSR 



'k = d=l]> WMSR[k >l,d>k] 



(12) 



Table |III] shows the minimum d that satisfies the inequality defined in eq. ST2\ for different on- 
line node availabilities, a, and different retrieve degrees k. We additionally provide the number of 
storage blocks, n, required to achieve p = 0.999999. We can see that for low node availabilities 
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Table III. Minimum d values to construct MSR codes that requiring less repair 
bandwidth than simple file replication. The target retrieve probability is p = 

0.999999. 





minimum repair degree satisfying eq. (|12 


and the value of n. 


Node availability 


fc = 50 


fc = 20 


fc = 5 


a = 0.5 


n = 159; d = 59 


n = 81;d = 24 


n = 36; d = 7 


a = 0.75 


n = 95;d = 61 


n = 47; d = 25 


n = 20;d = 7 


a = 0.9 


n = 71; d = 65 


n = 34; d = 27 


n = 13; d = 8 


a = 0.92 


n = 69; d = 64 


n = 32;d = 26 


n = 12;d = 7 


a = 0.95 


n = 64; d = 


n = 29;d = 27 


n = ll;d = 8 


a = 0.97 


n = 61; d = 


n = 27;d = 


n = 10; d = 9 


a = 0.99 


n = 57; d = 


n = 25;d = 


n = 8;d = 



14 
12 
10 
8 
6 
4 
2 












a=0.5 




\ 




' \ 


a=0.75 






- - a=0.99 - - 











14 
12 
10 
8 
6 
4 
2 




a=0.5 

a=0.75 

a=0.99 



10 30 



60 



90 



10 30 



60 



90 



(a) MBR when d = fc. 



(b) MBR when d = 



Fig. 4. Per-node bandwidth required to achieve p = 0.999999 for MBR codes using eq. (9). 



small values of d, slightly larger than k, are sufficient to reduce the per-node bandwidth required 
by replication. However, for high on-line node availabilities, the minimum value of d satisfying 
eq. (fT2l i becomes larger than n — 1, which is not a valid Regenerating Code configuration. This 
maximum on-line availability becomes higher for low k values, namely a > 0.95 for k = 50, 
a > 0.97 for fc = 20 and a > 0.99 for fc = 5. We can generally state that for high on-line node 
availabilities, replication becomes more bandwidth efficient than any MSR code, which confirms the 
result obtained by Rodrigues and Liskov in [Rodrigues and Liskov 2005) ■ 



Communication Cost for MBR Codes. In Figure |4] we plot the required per-node bandwidth of 
MBR codes for d = k and d ~ n ~ 1. For MBR codes, in difference to MSR codes, we can see 
that for both d values the required per-node bandwidth W asymptotically decreases with increasing 
fc and we can state; 

Remark 5.3. For MBR codes Wmbr \k = fc'] > Wmbr [fc = fc' + 1]. 

From Lemma|52]we know that for the same configuration, MBR codes are more bandwidth efficient 
than MSR codes. Using Remark l53] we can now state that all MBR codes are also more bandwidth 
efficient than simple replication, which is a special case of MSR: 

Lemma 5.4. The per-node bandwidth requirements of MBR codes are lower than or equal to 
the per-node bandwidth requirements of simple replication: Wmbr < Wmsr [k = d = 1]. 

Proof. If this lemma is true, then the per-node bandwidth of the MBR configuration that 
consumes the most bandwidth must be lower than or equal to the per-node bandwidth of 
replication. Since VKmbr is largest for fc = 1 (see Remark 15.3b . we can rewrite this lemma 
as: Wmbr [fc = = 1] < Wmsr [fc = = 1] ■ To proof it by contradiction we assume that 
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Fig. 5. Reduction of the communication cost by adopting a MBR code instead of replication as function of k a retrieve 
probability of p = 0.999999. 

W^MBR [fc = 1] > Wmsr [k ^ d = 1]. Using equations ( fTOl ) and ( fTTT ) we obtain: 

7MBR[fc = rf^l]- ^^^^^^ >7MSR[fc = rf=l]- ^^jg^^j 

7mbr [fc = fi = 1] > 7msr [k = d= 1] 

1 > 1; 

which is a contradiction. □ 

In Figure |5] we plot the communication savings a storage system makes when using a MBR 
code instead of replication. The savings have the same asymptotic behavior than the bandwidth 
requirements, Wmbr, depicted in Figure |4] Since for MBR codes ombr = 7mbr, i-e. the storage 
block size is the same as the repair bandwidth, the communication savings for MBR are the same as 
the storage savings listed in Table HH 

6. HYBRID REPOSITORIES 

In Section |5] we saw that except for one a particular case (MSR codes and high node on-line avail- 
abilities), MSR and MBR codes offer both, lower storage costs and lower communication costs than 
simple file replication. However, there are some scenarios where the storage system needs to ensure 
that files can be accessed without the need of decoding operations. For example, storage infrastruc- 
tures using replication IGhemawat et al. 20 03 , Borth akur 200711 may not afford a migration of their 
infrastructures from replication to erasure encodes. Other examples are on-line streaming services 
or content distribution networks (CDNs) that need efficient access to stored files without requiring 
complex decoding operations. 

As we saw in Section |5] maintaining whole file replicas (MSR codes with k = d ~ 1) has a 
higher storage cost than using coding schemes. However, when whole file replicas are required, 
storage systems can reduce this high cost by using a hybrid redundancy scheme that combines repli- 
cation and erasure codes. The replicas can also help reduce the communication cost when repairing 
lost data by generating new redundant blocks using the on-line file replicas: Generating a redun- 
dant block from a file replica requires transmitting a bytes instead of the -f = d ■ /3 bytes required 
by the normal repair process. From eqs. dl) and (0 it is easy to see that a < 7. While some pa- 
pers have studied hybrid redundancy schemes | Rodrigues and Liskov 2005 IHaeberlen et al. 20051 
IDimakis et al. 200"7l . their aim was to reduce communication costs and not to guarantee permanent 
access to replicated objects. Therefore, these papers assumed that only one replica of each file was 
maintained in the system, ignoring the two problems that arise when this replica goes temporarily 
off-line: (i) it is not possible to access the file without decoding operations, and (ii) repairs using the 
replica are not possible. 



A:12 



L. Pamies-Juarez and E. Biersack 



In this section we evaluate a different hybrid scenario, where the storage system may maintain 
more than one replica of the whole file in order to ensure with high probability that there is always 
one replica on-line. However, it is not clear if the overall communication costs of our hybrid scheme 
will be lower than the communication costs of a single replication scheme. Further, even if com- 
munication costs are reduced, the use of a double redundancy scheme (replication and coding) may 
increase storage costs. To the best of our knowledge, there is no prior work analyzing these aspects. 
In our analysis we differentiate between the probability pio„ of having a file replica on-line, and the 
retrieve probability p for being able to retrieve a files using encoded blocks, which requires that k 
out of a total of n storage blocks are on-line. We assume that pio„ p, for example p\o^ = 0.99 and 
p = 0.999999, which is motivated by the fact that while users are likely to tolerate higher access 
times to a file, which will need to be reconstructed first in some rare cases when no replicas are 
found on-line, but they require very strong guarantees that data is never lost. 

Adapting Communication Cost to the Hybrid Scheme. In a hybrid scheme we need to consider 
two types of repair traffic, namely (i) traffic WyiSRyk = = 1], to repair lost replicas and (ii) traffic 
p^iepi repair encoded blocks. Since in the hybrid scheme blocks are repaired directly from a 
replicated copy, repairing an encoded block requires transmitting only one new storage block of a 
bytes. We obtain VF'^'^p' by replacing in eq. dTji the term "traffic to repair a block" in by a. Arranging 
the terms we obtain the following two expressions: 

1 _ ii[k,a,p] MO 
"^^SR- Nni] ^^^^ 



^i-epl _ 2d ■ ??[fc,Q,p] ^ MO 

ka{2d-k + l) ^ NE[L] 



Note that these expressions assume that all lost blocks are repaired from replicas. Since we are 
adopting a proactive repair scheme, the system can delay individual repairs when no replicas are 
available. However, since replicas are available most of the time, these delays will rarely happen. 
Comparing WjJJjJ^, and M^Mg,^ we can state the following lemma: 

Lemma 6.1. For the same k, d and p parameters, a hybrid scheme using a MBR code has a 
communication cost that is at least as high as the communication cost of a hybrid scheme using a 
MSR code. 

Proof. We can state the lemma as Wj^gj^ < M^^gj^. Using equations ( fTsT i and (fl4] i we obtain: 

ri[k,a,p] ^ MO ^ 2d-ri[k,a,p] ^ MO 



ka NE[L] - ka{2d-k + l) N E[L] 
2d 



1 < 



2d-k + l 
2d-k + l<2d 
1 < fc 

which is true by the definition of Regenerating Codes. □ 

Lemma l6n implies that MSR codes when used in hybrid schemes are both, more storage-efficient 
and more bandwidth-efficient than MBR codes. For this reason we will not consider the use of MBR 
codes in hybrid schemes. 

Let us assume that the required retrieve probability for the whole hybrid system is p and that the 
retrieve probability for replicated objects is pio„, piow ^ p. A hybrid scheme reduces the storage 
cost compared to replication when the following condition is satisfied: 

^MSR[fc = 1; Plow] + i?MSR[fc > 1; p] < i?MSR[fc = 1 ; p] . (15) 
hybrid storage costs replication storage costs 
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Table IV. Number of replicas required to achieve a retrieve prob- 
ability 

Plow foi^ diffGrent nodG 3vailabilities a. 



Node availability 


Number of replicas required 
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Plow = 0.99 


Plow = 0.98 


Plow = 0.95 
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0.75 
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3 


0.99 


1 
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1 




02468 10 02468 10 02468 10 

p (no. nines) p (no. nines) p (no. nines) 

(a) Storage efficient hybrid schemes (b) Bandwidth efficient hybrid schemes (c) Bandwidth efficient hybrid schemes 
(any d value). (when d = k). (when d = n — 1). 

Fig. 6. The (p,pio„) -pairs under each of the lines represent the scenarios where a hybrid scheme (replication-HMSR 
codes) reduces the costs of a single replicated scheme. The lines ai'e the maximum pi(,„ values that satisfy eq. i\5) for (a), 
and eq. (16) for (b) and (c). 

And analogously, a hybrid scheme reduces communication costs when: 

WMSR[k = 1; Plow] + w;^^[k > 1; p] < WusRik = 1; p] ■ (16) 
^ . ' " » ' 

hybrid comm. costs replication comm. costs 

In Figure|6a]we plot the maximum value forpiow that satisfies eq. fT5[ as a function of the overall 
retrieve probability p for different on-line node availabilities a. The k parameter is set to fc = 50 
when a = 0.5, k ~ 20 when a = 0.75 and k = 5 when a = 0.99. The (p,piow) -pairs below each 
of the lines correspond to the hybrid instances that satisfy eq. (fTSl l, i.e. a hybrid scheme reduces the 
storage costs. Similarly, in figures l6b] and l6c] we plot the (p,piow) -pairs that satisfy eq.(fT6]l, i.e. a 
hybrid scheme reduces the communication costs. 

As example, let us assume a storage system that wants 99% data availability for their replicated 
files. In this case (piow = 0.99), looking at Figure|6]we see that a hybrid scheme (replicationn-MSR 
codes) can reduce the storage costs compared to replication only when p > 0.999999 for a = 0.99, 
when p > 0.99 for a = 0.75, and when p > 0.9 for a = 0.5. Since in general we always want 
strong guarantees that files are never lost — e.g., we assume p > 0.999999 — , we can conclude that 
hybrid schemes reduce storage and communication cost for almost all practical scenarios. 

It is interesting to note that in Figure |6]all three sub-figures look very much alike. The reason is 
that the cost contribution of replication is significantly higher than the cost contribution of the coding 
(see Section|5]l. Since we have demonstrated the cost efficiency of a hybrid scheme forpiow = 0.99, 
which requires a larger number of replicas than configurations with piow < 0.99, see Table [TVl a 
hybrid scheme will also reduce storage and communication costs for any system requiring fewer 
replicas i.e., pio„ < 0.99. 

7. EXPERIMENTAL EVALUATION 

In previous sections we presented our generic storage model based on Regenerating Codes and we 
analytically analyzed the storage and communication costs for MSR and MBR codes, as well as the 
efficiency of using these codes in hybrid redundancy schemes. In this section, we aim to evaluate 
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how the network traffic caused by repair processes can affect the performance and scalability of the 
redundancy scheme. For that, we assume a distributed storage system constrained by its network 
bandwidth: a system where storage nodes have low upload bandwidth and nodes have low on-line 
availabilities. For such a storage system we will evaluate two measures that are difficult to obtain 
analytically: (i) the real bandwidth used by the repair process — i.e., bandwidth utilization — , and 
(ii) the repair time — i.e., time required to download d fragments. In this way we can evaluate the 
impact of the repair degree d on bandwidth utilization and system scalability. 

Bandwidth utilization. Given a node upload bandwidth, ut, and the per-node required bandwidth, 
W, we can theoretically state that a feasible storage system must satisfy uj > W, and that the storage 
system reaches its maximum capacity when uj = W. However, practical storage systems may not 
reach this maximum capacity because of system inefficiencies due to failed repairs or fragment 
retransmissions. To measure these inefficiencies, we will compare the real bandwidth utilization p 
with the theoretical bandwidth utilization p = W/ui. 

Repair time. The repair time is proportional to the repair bandwidth, 7, the repair degree, d, and 
the probability a of finding a node on-line [Pamies-Juarez et al. 2010|. We showed in Section[5]that 
increasing d reduces the repair bandwidth 7, (see eqs. ([TJ and (l2]i), which should then intuitively 
reduce repair times. However, since the system only guarantees k on-line nodes, contacting d > k 
nodes may require to wait for nodes coming back on-line, which will cause longer repair times. In 
previous sections we only considered two repair degrees d, namely d = k and d = n — 1. In this 
section we will analyze how different d values affect repair times and bandwidth utilization. 

7.1. Simulator Set-Up 

We implemented an event-based simulator that simulates a dynamic storage infrastructure. Initially, 
the simulator starts with N = 500 storage nodes. New node arrivals follow a Poisson process with 
average inter-arrival times K[L]/N. Node departures follow a Poisson process with the same inter- 
departure time. Once a node joins the system it draws its lifetime from an exponential distribution L 
with expected value K[L] = 100 days. During their lifetime in the system, nodes alternate between 
on-line/off-line sessions. For each session, each node draws its on-line and off-line durations from 
distributions and Xoff respectively. In this paper Xo„ and Afon are exponential variates with 
parameters 1/{B ■ a) and l/(i?(l — a)) respectively, where B is the base time and a the node on- 
line availability. Using the mean value of the exponential distribution we can compute the average 
duration of the on-line and off-line periods as (in hours): 



The simulator implements parameterized Regenerating Code. To cope with node fail- 
ures, redundant blocks are repaired in a proactive manner following the algorithm defined 
in IIDuminuco et al. 200711 and the simulator proactively generates new redundant blocks at a con- 
stant rate. For each stored object, a new redundant block is generated every E[i]/n days. To balance 
the amount of data assigned to each node, each repair is assigned to the on-line node that is lest 
loaded in terms of the number of stored blocks and the number of repairs going on. 

If the repair node disconnects during a repair process, the repair is aborted and restarted at another 
on-line node. Similarly, when a node uploading data disconnects, the partially uploaded data is 
discarded and the repair node starts a block retrieval from another on-line node. 

The number of objects stored in the system is set in all the simulations to achieve a desired 
system utilization p. Given p, the number of stored objects, O, is obtained using the two following 



E[Xon] = B-a 
E[X,ff] = B • (1 - a) 



(17) 
(18) 
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Fig. 7. Bandwidth utilization and repair times for MSR and MBR and different repair degrees d when the object size is 
A4 =120MB and the number of objects O is set to achieve half bandwidth utilization p = 0.5. The rest of the parameters 
are set to: A: = 20 and B = 24hours. 



expressions: 

Lupakid^k + 1) NE[L] 

^MSR — — J r, ] — X — — (.iy; 

a-ri[k,a,p\ A4 

Lopak[2d-k+l) N^L] 
2d ■ ri[k,a,p\ M 

These formulas are obtained by taking the definition of utilization, p = W/uj, replacing Why p ■ w 
in eq. (|9]l and solving the equation for O. 

We set the on-line node availability to a = 0.75 and we set k = 20. With these values, we 
use eq. dU to compute the minimum number of redundant blocks, n, required to achieve a retrieve 
probability p = 0.999999: 'n[20, 0.75, 0.999999] = 47. 

Finally, the node upload bandwidth is set to ut =20KB/sec, allowing only one concurrent upload 
per node. To simulate asymmetric network bandwidth, we allow up to 3 concurrent downloads per 
node, which makes a maximum download bandwidth of 60KB/sec. 



7.2. Impact of the Repair Degree d 

In Figure |7] we measure the effect of the repair degree on the sy stem utilization and on the repair 
times. In this experiment, we set the size of the object to A4 = 120MB and the base time to i? = 24 
hours — i.e. on average nodes connect and disconnect once per day. The number of stored objects 
is set to achieve a bandwidth utilization of p = 0.5. Figure [Tc] shows the number of objects O for 
p = 0.5, and Figure |7d] the storage space required. Figures [Ta] and |7b] show that small d values 
(values close to fc = 20) allow to keep the bandwidth utilization on target and assure low repair 
times. However, for repair degrees d > 34 the repair times start to increase exponentially. 
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Fig. 8. Bandwidth utilization and repair times for MSR and MBR and different base times B when the object size is 
M = 120MB and the number of objects O is set to achieve a bandwidth utilization p = 0.5. The rest of the parameters ai'e 
set to: fc = 20 and d = 36. For the MSR case O = 5069, and for MBR O = 10984. 



It is interesting to see that when the repair times are quite long, nodes executing repairs may 
not finish their repairs before disconnecting since repair times become longer than on-line sessions. 
In this case, failed repairs are reallocated and restarted in other on-line nodes. These unsuccessful 
repairs cause useless traffic that increase then the real bandwidth utilization. In Figure|7a]we can see 
how for d > 38 repair times start to be larger than on-line sessions, increasing utilization beyond 
0.5. It is important to note that these larger repair times can jeopardize the reliability of the system: 
large d values can cause most repairs to fail, reducing the amount of available blocks and reducing 
the probability of successfully accessing stored files. 

To investigate the increase of bandwidth utilization in detail, we analyze in Figure |8] the per- 
formance of the storage system for the point where repair times begin to increase, d = 36. At this 
point we evaluate repair times and bandwidth utilization for different base times, B. As B increases, 
the duration of on-line sessions become longer and fewer repairs need to be restarted, theoretically 
reducing bandwidth utilization. We can see this effect in Figure |8al larger base times reduce the 
bandwidth utilization of the system. Due to this utilization reduction, repair times are also slightly 
reduced as we can see in Figure |8b] 

7.3. Scalability 

Other than the impact of the repair degree d and the base time B we aim to analyze the behavior of 
the storage system under different target bandwidth utilizations. In Figure |9] we plot the measured 
utilization and repair times for a wide range of target utilizations p. We set the size of the stored 
objects to 120MB and we increase the number of stored objects, O, to achieve different utilizations. 
In this scenario we set fc = d = 20. In Figure we see how the measured utilization is nearly the 
same than the target utilization. This is because d ^ k causes short repair times and repairs typically 
finish before nodes go off-line. However, in Figure |9j) we can appreciate how for a high bandwidth 
utilization of p = 0.9, the saturation of the node upload queues increases repair times significantly. 

In Figure[TO]we plot the same metrics as in Figure |9]but for a repair degree of d = 36. Increasing 
the repair degree causes longer retrieval times, however as we saw in Figure|7] d = 36 keep repairs 
short enough to guarantee that the utilization is not affected. However, by increasing the repair de- 
gree from d = 20 to d = 36 we can store on the same system configuration one order of magnitude 
more objects, namely 6452 (MSR, d = 36) instead of 683 (MSR, d = 20). 

Finally, in Figure [TT| we analyze the impact of object size on bandwidth utilization and repair 
times. For each object size we set the number of stored objects to achieve a target bandwidth uti- 
lization of p = 0.5. Since the utilization is the same for all object sizes, the number stored objects, 
O, decreases as the object size increases (Figure [TTl). Independently of the object size, the total 
amount of stored data, O x M remains constant: 774GB for MSR codes and 1206GB for MBR 
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Fig. 9. Bandwidth utilization and repair times for MSR and MBR and different targeted utilizations p when the object size 
is M = 1 20 MB and the number of objects O is set to achieve the targeted p. The rest of the parameters are set to: fc = 20, 
B = 24 hours and d = 20. 



codes. We can also see in Figure [TTh that the measured bandwidth utilization is independent of the 
object size. However, as expected, we can see in Figure [TTb that larger objects take longer to repair 

8. CONCLUSIONS 

In this paper we evaluated redundancy schemes for distributed storage systems in order to 
have a clearer understanding of the cost trade-offs in distributed storage systems. Specifi- 
cally, we analyzed the performance of the generic family of erasure codes called Regenerating 
Codes IDimakis et al. 200711 . and the use of Regenerating Codes in hybrid redundancy schemes. 
For each parameter combination we analytically derived its storage and communication costs of 
Regenerating Codes. Our cost analysis is novel in that it takes into account the effects of on-line 
node availabilities and node lifetimes. Additionally, we used an event-based simulator to evaluate 
the effects of network utilization on the scalability of different redundancy configurations. Our main 
results are as follows: 

— Compared to simple replication, the use of a Regenerating Codes can reduce the costs of a storage 
system (storage and communication costs) from 20% up to 80%. 

— The optimal value of the retrieval degree k depends on the on-line node availability, ranging from 
fc = 5 when nodes have 99% availability, to fc = 50 when nodes have 50% availability. Once 
fc is fixed, storage systems with limited storage capacity can maximize their storage capacity by 
adopting MSR codes. On the other hand, systems with limited communications bandwidth can 
maximize their storage capacity by adopting MBR codes. 

— High repair degrees d reduce the overall communication costs but may increase repair times 
significantly, which can lead to data loss. We experimentally found that the repair degree should 
be small enough to make sure the repair times are shorter than the on-line session durations of 
nodes. 
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Fig. 10. Bandwidth utilization and repair times for MSR and MBR and different targeted utilizations p when the object 
size is J\4 =120MB and the number of objects O is set to achieve the targeted p. The rest of the parameters are set to: 
fc = 20, B = 24 hours and d = 36. 

— Finally, in storage systems where the access to whole file replicas is required, we showed that 
hybrid schemes combining replication and MSR codes are more cost efficient than simple repli- 
cation. 
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